MIMIC : a voice-adaptive phonetic-tree speech synthesiser
نویسندگان
چکیده
This paper presents Mimic : a decision-tree based concatenative voice adaptive text to speech synthesiser. Mimic integrates text to speech synthesis (TTS) with speech recognition and speaker adaptation. Speech is synthesised from concatenation of triphone synthesis units. The triphone units are obtained from clusters of training examples modelled, labelled and segmented using clustered HMMs and Viterbi segmentation. The prosodic structure of pitch, duration and energy contours are captured using statistical training methods. The concept of a decisiontree based statistical micro-prosody model is introduced as a hierarchical method of modelling prosodic parameters. The voice adaptation component involves the adaptation of the spectral parameters as well as pitch, duration, and energy.
منابع مشابه
An HMM-based speech synthesiser using glottal post-filtering
Control over voice quality, e.g. breathy and tense voice, is important for speech synthesis applications. For example, transformations can be used to modify aspects of the voice related to speaker’s identity and to improve expressiveness. However, it is hard to modify voice characteristics of the synthetic speech, without degrading speech quality. State-of-the-art statistical speech synthesiser...
متن کاملDevelopment of an emotional speech synthesiser in Spanish
Currently, an essential point in speech synthesis is the addressing of the variability of human speech. One of the main sources of this diversity is the emotional state of the speaker. Most of the recent work in this area has been focused on the prosodic aspects of speech and on rule-based formantsynthesis experiments. Even when adopting an improved voice source, we cannot achieve a smiling hap...
متن کاملCurrent status of the IBM Trainable Speech Synthesis System
This paper describes the current status of the IBM Trainable Speech Synthesis System. The system is a state-of-the-art, trainable, unit-selection based concatenative speech synthesiser. The system uses hidden Markov models (HMMs) to provide a phonetic transcription and HMM state alignment of a database of single-speaker continuous-speech training data. The runtime synthesiser uses the HMM state...
متن کاملEfficient Diphone Database Creation for MBROLA, a Multilingual Speech Synthesiser
Diphone synthesis is a convenient way for testing phonetic models of human speech. It allows easy manipulation of duration and pitch, therefore it is used not only for general intonation contour evaluation, but also for expressive speech synthesis. The main advantage of using MBROLA [11][9],[12],[13] is the fact that not all the diphones need to be contained in the voice to test speech models. ...
متن کاملAutomatic intonation modeling with INTSINT
Accurate intonation modeling has become a vital part of modern day speech synthesis systems. This is especially true for tonal languages such as isiZulu, where the intonation of an utterance not only influences the perceived naturalness of the synthetic voice, but may also influence its semantics. In this work we explore the INTSINT intonation modeling algorithm and its application to an isiZul...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998